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Abstract 

Psychophysical studies suggest that face recognition takes place in a narrow band 
of low spatial frequencies ("critical band"). Here, we examined the recognition per- 
formance of an artificial face recognition system as a function of the size of the 
input images. Recognition performance was quantified with three discriminability 
measures: Fisher Linear Discriminant Analysis, non Parametric Discriminant Anal- 
ysis, and mutual information. All of the three measures revealed a maximum at the 
same image sizes. Since spatial frequency content is a function of image size, our 
data consistently predict the range of psychophysical found frequencies. Our results 
therefore support the notion that the critical band of spatial frequencies for face 
recognition in humans and machines follows from inherent properties of face images. 
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1 Introduction 



A considerable number of psychophysical studies coincide in that mechanisms 
for face recognition in humans do not use all available visual information about 
faces equally. The visual information we refer to concerns the spatial frequency 
composition of face images. Specifically, a narrow band settled at the lower 
end of the frequency spectrum seems to be optimally suited for the recognition 
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of previously learnt faces. The frequency band is centered at about 10 to 15 
cycles per face, and its bandwidth is about 2 octave [1,2,3,4,5,6,7,8]. Thus, 
face recognition (and also object recognition in general) follows a bandpass 
characteristics in humans. Furthermore, this result does not depend, to a first 
approximation, on viewing instance [4,8]. The unit for spatial frequencies "cy- 
cles per face" (or "cycles per object") expresses this scale invariance. 
Nonetheless, to the best of our knowledge, no conclusive explanation for the 
bandpass characteristic of face recognition has emerged so far. A recent study 
conducted by one of the authors linked this characteristic to inherent prop- 
erties of face images. By examination of the responses of a model of simple 
and complex cells to face images, Keil could show that higher response am- 
plitudes are obtained at spatial frequencies consistent with the corresponding 
psychophysical data [9]. Therefore, the visual system should encode visual in- 
formation for processing faces preferably at those spatial frequencies, where 
the highest signal-to-noise ratio is obtained. Only then a fine discrimination 
between signals will be possible. Or, otherwise expressed, only then we will be 
able to learn and perceive fine differences between otherwise similar faces. 
Given this link between the statistics of facial images and psychophysical data, 
we reasoned that an artificial face recognition system should reveal similar 
properties as the human visual system does: we expected to see an optimal 
recognition performance of the artificial system at the same spatial frequen- 
cies as observed with humans. To this end, we explored several measures of 
recognition performance. Furthermore, as suggested by the results from ref. 
[9], internal face features are the principal cause for the bandpass character- 
istic of face recognition. This holds especially true for the eyes, but also for 
mouth and nose, albeit to a less extent. Consequently, we suppressed external 
features (hairline, shoulder regions) in the present study. The results of the 
present study suggest that the machine indeed does it like humans - recogni- 
tion performance peaks within a narrow band of low spatial frequencies. 

This paper is organized as follows: the next section describes the image pro- 
cessing and the separability criteria that have been considered in the exper- 
iments, section 3 shows the obtained results, and section 4 summarizes and 
concludes this work. 



2 Methods 

2.1 Processing of face images 

For our experiments, we used images from the FRGC Database (http://www. bee- 
biometrics, org/). In these images, faces appear against uniform, grey back- 
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ground, and with homogeneous illumination conditions for all subjects. The 
database consisted of 3772 high quality images from 275 different persons, 
where four to 32 images exist for each subject. To perform the experiments 
we first aligned and then resized the images such that each resulting image 
had an eye-to-eye distance of 50 pixels. Figure 1 shows some examples of the 
such normalized images. 

Due to their relatively high variability, we decided to suppress external face 




Fig. 1. Example images from the FRGC database. The images were acquired under 
controlled conditions. 

features by windowing each face image with a 4-term Blackman-Harris win- 
dow ("B.H.- window"). This window was compared to 15 alternative windows, 
and scored the highest similarity with corresponding images whose external 
features were removed manually [9] . For each image, the window was centered 
at the position of the nose {x nose ,y nose ). The nose position was estimated from 
the coordinates of the left and the right eye (xi e ,yi e ), (x re ,y re ), respectively, 
and the mouth (x mouth ,y mouth ): 



The operator rnd(-) denotes rounding of its argument to the nearest integer 
value. Figure 2 illustrates result of applying the B.H. -window to the images 
of Figure 1. 

We adopted the following procedure to assess the frequency-dependence of 
face recognition with our artificial system. Each image was down-sized to 
continuously smaller sizes. After down-sizing, we enhanced the highest spatial 
frequencies with a modified algorithm for suppressing illumination effects [10]. 
Specifically, we modified the algorithm such that its output mimicked the 
responses of retinal ganglion cells [11] in a way that contour enhancement 
at high spatial frequencies occurred (Figure 3). Consequently, our "Weber- 
filtered" images are dominated by the Nyquist frequencies associated with 
each image size. In this way, computational time could be saved over naive 
bandpass filtering. 
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Fig. 2. The images shown here are the result of applying a Blackman-Harris window 
to the images of Figure 1. The application of the Blackman-Harris window leads to 
a good suppression of external face features (e.g. hair and shoulders). 




Fig. 3. Examples of images after applying the "Weber-filtering" . Compared to the 
original images (top row), Weber-filtering leads to enhancement of high spatial 
frequencies, and discounts illumination effects at the same time (bottom row). 

2.2 Separability Measures 

In order to measure the optimal dimensionality we need to define a formal 
class separability criterion. Class separability can be measured in terms of 
classification accuracy or class distribution. In the first case, the measure is 
highly dependent on the classifier used. In this paper we propose the use of a 
classifier-independent set of statistical measures to validate the psychophysical 
results for human face recognition. 

Two types of class separability measures are described in the literature [12], 
where one is based on scatter matrices, and the other one is based on imposing 
an upper limit on the Bayes error (Bhattacharyya distance). In this paper 
we will focus on the former measure (scatter matrices), because the latter 
approach necessitates the estimation of probability distributions, which is a 
notoriously difficult endeavor. 

A further statistical criterion to measure the separability between classes is 
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based on mutual information, which is defined as: 




dXdY 



(2) 



where X and Y are two random variables, and p(X) and p(Y) their respective 
probability density functions. In this paper we compute mutual information 
between data points X and classes C . A large value of mutual information 
in this case means that we have much information about the class C given 
the observation X. On the other hand, if the mutual information is zero, then 
both variables are independent. Notice that the computation of mutual infor- 
mation also necessitates the estimation of corresponding probability distribu- 
tions. However, Torkkola [13] recently proposed a method which makes the 
computation of mutual information feasible by using a quadratic divergence 
measure that allows an efficient non-parametric implementation, without prior 
assumptions about class densities. More concretely, his approximation is in- 
spired by the quadratic Renyi entropy, and the method can be used with 
training data sets of the order of tens of thousands of samples. 



2.3 Discriminant Analysis 

Classic discriminant analysis techniques have often been applied to linear fea- 
ture extraction in order to find the projection matrix that preserves the class 
separability of data points. Typically, two kind of statistics have been used 
for this purpose: (i) the within class scatter matrix that shows the scatter of 
samples around the same class, and (ii) the between class scatter matrix. 
In order to formulate a criterion for class separability, each matrix has to be 
reduced to a single and unique number. This number should be large when 
the between class scatter is large - or when the within class variation is small. 
Several ways for computing the number have been defined in the literature: 



In the classic feature extraction literature the J\ criterium is used, given that 
it can be maximized using a closed formulation. The general technique to get 
the job done is known as Fisher Linear Discriminant Analysis (FLD) [14], and 
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uses as Si: 



K 



Sb = % H( m fe ~ m o)(m fc - m ) T (6) 

fc=i 



where m fc is the class-conditional sample mean and m is the unconditional 
(global) sample mean. Furthermore, for S 2 : 

1 k 

$W = J7 ( 7 ) 
^ k=l 



where S& is the class-conditional covariance matrix for Ck estimated from the 
data. 

The main problem with the classical FLD approach is that the optimization of 
the criterion (3) using Sb and Sw is blind for anything beyond second order 
statistics. As a consequence, it may be inaccurate for measuring separability of 
more complex structures. To remedy, Fukunaga and Mantock [15] propose to 
use a non-parametric estimated between-class scatter matrix Sb, which has 
generally a full rank. This estimation was used in the non Parametric Dis- 
criminant Analysis algorithm (NDA), which has been shown to considerable 
improve the accuracy of the classic FLD. In a nutshell, the non parametric 
between class scatter matrix is estimated as follows. 

Let x be a data point in X with class label Cj, and by x class the subset of the k 
nearest neighbors of x among the data points in X with class labels different 
from Cj. We calculate a local between-class matrix for x as: 

A * = ^7 £_(z-x)(z-x) T (8) 



The estimate of the between-class scatter matrix S^ is found as the average 
of the local matrices 

Sb = ± E A z b (9) 



The resulting S# is used in the criterion (3) as the new Si. 



3 Results 



We used the three separability measures described in the previous section 
(FLD, NDA and MI) for evaluating the recognition performance as a function 
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of image size. To this end, 20 subjects were randomly selected, with each of 
the subjects having more than 25 images to compute the different geometri- 
cal and statistical measures described above. All numerical experiments were 
carried out with the original image set, and a second image set obtained by 
applying Weber-filtering [10]. In this way we were able to address how our 
results depend on the presence of low spatial frequencies in the down-sized 
images. 

Figures 5, 6, and 7 show the dependence of the FLD (Fisher Linear Dis- 
criminant Analysis), NDA (non Parametric Discriminant Analysis), and MI 
(mutual information) measures, respectively, on image size. Each of the three 
measures reveals a distinct maximum at approximately the same image size 
(around 22 x 22 pixels). As one can appreciate from the sample images shown 
in Figure 4, this image size translates to roughly 8 to 10 cycles per face, what 
compares favorably to the psychophysical results as described in the introduc- 
tion. The recognition performance (in terms of discriminability) also reveals 
some dependency on whether the original images are used, or whether the 
face images were Weber-filtered. Specifically, the maxima show a trend to get 
more pronounced with Weber-filtering. At the same time, the amplitudes of 
NDA and MI (but not FLD) grow, indicating an increased recognition perfor- 
mance when only a small band of spatial frequencies is used. This behavior 
of our artificial face recognition system is also consistent with corresponding 
psychophysical observations with humans - the bandwidth for face recognition 
was estimated to be around two octaves (e.g., [7]). 




Fig. 4. Examples at image sizes which are associated with maximum discrimination. 
(22 x 22 ). The image sizes translate to approximately 5—8 cycles per face width, and 
6 — 12 cycles per face height, and are thus within the ballpark of the corresponding 
psychophysical data. Notice that here the re-sized original images are shown (i.e., 
without application of the Blackman-Harris window) to achieve a better visibility. 



4 Summary and Conclusions 

Psychophysical studies suggest that for face recognition, human observers 
make use of a narrow band at low spatial frequencies (10 to 15 cycles per face, 
bandwidth two octaves). Here, we evaluated the recognition performance of 
an artificial face recognition system as a function of image size. Recognition 
performance was measured by three different measures (Fisher Linear Discrim- 
inant Analysis, non Parametric Discriminant Analysis, and mutual informa- 
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FLD measure (Original images) 



FLD measure (Weber filtered images) 





Fig. 5. FLD (Fisher Linear Discriminant Analysis) Measure. Left plot shows results 
with the original images, right plot with Weber-filtered images. 



NDA Measure (Original Images) 



NDA Measure (Weber filtered images) 
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Fig. 6. NDA (non Parametric Discriminant Analysis) Measure. Left plot shows 
results with the original images, right plot with Weber-filtered images. 

tion), which all indicated a performance maximum of the artificial system at 
an image size of about 22 x 22 pixels. This corresponds to spatial frequencies 
at around 8 — 10 cycles per face, thus comparing well to the range of measured 
psychophysical data (although the psychophysical data are somewhat under- 
estimated). We also found an effect of the presence of low spatial frequencies in 
the image. Recognition performance seems to even increase when low spatial 
frequencies are suppressed by Weber-filtering. In other words, decreasing the 
bandwidth of the spectrum of spatial frequencies in the face images increases 
the recognition performance, at least when measured by non Parametric Dis- 
criminant Analysis and mutual information. Such behavior is again in line 
with the narrow band of critical spatial frequencies found psychophysically. 
The present study furthermore lends further support to the findings of Keil 
[9] in that the stimuli (i.e., face images) provide the explanation of the pref- 
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Fig. 7. MI (mutual information) Measure. Left plot shows results with the original 
images, right plot with Weber-filtered images. 

erence of a narrow spatial frequency band for both human and artificial face 
recognition. As a consequence, artificial face recognition systems should focus 
on these frequencies to achieve an optimal recognition performance (in terms 
od class separability). Because this critical spatial frequencies correspond to 
small image patches, a further advantage is an economic use of resources for 
both processing and storing faces. 
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